Effects of Training Set Expansion in Handwriting Recognition Using Synthetic Data
نویسنده
چکیده
A perturbation model for the generation of synthetic textlines from existing cursively handwritten lines of text produced by human writers is presented. Our goal is to improve the performance of an off-line cursive handwriting recognition system by providing it with additional synthetic training data. It can be expected that by adding synthetic training data the variability of the training set improves, which leads to a higher recognition rate. On the other hand, synthetic training data may bias a recognizer towards unnatural handwriting styles, which could lead to a deterioration of the recognition rate. In this paper we will show that for certain configurations of the parameters of the recognizer and the synthetic handwriting generation process, the recognition performance can be significantly improved.
منابع مشابه
Off-line cursive handwriting recognition using synthetic training data
The objective of this thesis is to investigate the generation and use of synthetic training data for off-line cursive handwriting recognition. It has been shown in many works before that the size and quality of the training data has a great impact on the performance of handwriting recognition systems. A general observation is that the more texts are used for training, the better recognition per...
متن کاملTraining Set Expansion in Handwritten Character Recognition
In this paper, a process of expansion of the training set by synthetic generation of handwritten uppercase letters via deformations of natural images is tested in combination with an approximate k−Nearest Neighbor (k−NN) classifier. It has been previously shown [11] [10] that approximate nearest neighbors search in large databases can be successfully used in an OCR task, and that significant pe...
متن کاملGeneration of Synthetic Training Data for an HMM-based Handwriting Recognition System
A perturbation model for generating synthetic textlines from existing cursively handwritten lines of text produced by human writers is presented. Our purpose is to improve the performance of an HMM-based off-line cursive handwriting recognition system by providing it with additional synthetic training data. Two kinds of perturbations are applied, geometrical transformations and thinning/thicken...
متن کاملParameter calibration for synthesizing realistic-looking variability in offline handwriting
Being motivated by the widely accepted principle that the more training data we have, the better performance the recognition system has, we conducted experiments asking human subjects to do test on a mixture of real English handwritten textlines and textlines altered from existing handwriting with various distortion degrees. The idea of generating synthetic handwriting is based on a perturbatio...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003